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Abstract 


The rise of the accountability movement in education has resulted in the 
proliferation of school report cards, school ratings and rankings, and other kinds of 
performance reporting for public consumption and policy use. To understand the 
strengths and limitations of school rating systems and the role they play in shaping 
public perceptions and school improvement practices, this paper situates rating 
systems within the broader field of comparative organizational assessments and 
neo-institutional theory; describes school rankings and rating systems in use by 
states and consumer-oriented enterprises; and details four aspects of school ratings 
(measurement, transformation, integration, and presentation) that affect their use 
and interpretation. 
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Introduction 


The rise of the accountability movement in 
education has resulted in the proliferation of school 
report cards, school ratings and rankings, and 

other kinds of performance reporting for public 
consumption and policy use (Coburn & Turner, 
2012). These performance reports emerge from 

a long history of increasing quantification of the 
performance of private enterprises and public 
agencies (Espeland & Sauder, 2007) and are intended 
to increase transparency and lower asymmetries 

of information between the public and seemingly 
opaque educational agencies (Gormley & Weimer, 
1999; Haertel & Herman, 2005). School performance 
reports became widespread due to the passage of No 
Child Left Behind (NCLB), the 2001 re-authorization 
of the federal Elementary and Secondary Education 
Act (ESEA), which mandated public reporting of 
standardized test score performance for public- 
sector schools. With the waiver of strict NCLB 
reporting requirements initiated in 2011 and the 
re-authorization of ESEA as the Every Student 
Succeeds Act (ESSA) in 2015, accountability systems 
and school ratings have diversified further, drawing 
on increasingly detailed data systems and the 
dissemination of advanced statistical techniques to 
merge test performance and other kinds of measures 
into more comprehensive school assessments (ESSA, 
2015; US Department of Education, 2015). 


These reporting systems have engendered a host of 
positive and negative consequences for school and 
teacher practices, public administration, and family 
choice (Booher-Jennings, 2005; Colyvas, 2012; 
Diamond & Cooper, 2007; Hastings & Weinstein, 
2008). School administrators and individual 
teachers have adapted to the pressure of high- 
stakes accountability reporting in various ways, 
from systematic attempts to improve instruction to 
manipulating test results or other reports of school 
performance (Heilig & Darling-Hammond, 2008; 
Herman & Haertel, 2005). In part to address the 
unintended consequences of performance reporting, 
accountability policies have shifted away from a 
narrow set of increasingly punitive responses to 
poor school performance to more comprehensive 
and ongoing supports to all schools to improve the 
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education they provide (Martin, Sargrad, & Batel, 
2016). 


To understand the role that rating systems play in 
guiding accountability supports and policy decisions, 
and the effects that performance ratings and rankings 
have on teacher and administrator practices within 
schools, it is important to understand what the 
ratings or rankings are intended to accomplish and 
the key methodological and design decisions that 
are involved in crafting them. To contribute to this 
understanding, this paper situates school rating 
systems within the broader field of comparative 
organizational assessments and neo-institutional 
theory; describe school rankings and rating systems 
in use by states and consumer-oriented enterprises; 
and detail four aspects of school ratings systems 
that affect their use and interpretation. Examining 
school rating systems comparatively and in light 

of broader work on organizational assessments 

can provide an opportunity to foster deeper, more 
meaningful conversation about the appropriate uses 
of performance measures for school and student 
improvement. 


Paying particular attention to high schools, we focus 
on school ratings systems that provide a score, grade, 
rank, or other rating to individual schools based 

on their performance on various student outcome 
measures. This definition excludes accountability 

or school improvement categories such as NCLB’s 
designations related to failing to meet adequate yearly 
progress (AYP) multiple years in a row, and similar 
classifications implemented by states in response 

to approval of NCLB waivers and currently being 
designed and implemented under ESSA. These 
designations are part of larger accountability systems 
that tie performance measurement (including, 
oftentimes, summary school ratings), accountability 
categories, and school supports or interventions 
together. A full rendering of the landscape of 
accountability systems is beyond the scope of this 
paper, but the usefulness and efficacy of school 
ratings must be ultimately evaluated in this fuller 
context. That said, school ratings themselves have 
significant implications for schools and educational 
agencies. To understand why this is the case, it is 
helpful to begin by delineating how school ratings fit 
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within the wider landscape of performance reports 
known as comparative organizational assessments. 


School Ratings Systems in the Context of 
Comparative Organizational Assessments 


Elementary and secondary school ratings and 
ranking systems are one example of a much broader 
phenomenon of comparative organizational 
assessments that have grown dramatically in recent 
decades in response to the explosion in available data 
and the increasing ease of computing and publishing 
(frequently online) ever more complex evaluations of 
organizations, institutions, and government bodies 
(Coe & Brunet, 2006). Comparative organizational 
assessments are any type of cross-organization or 
cross-institution report of comparable metrics on 
performance, processes, rules, resources, or other 
factors relevant to the evaluation of the target 
enterprise (Gormley & Weimer, 1999). The target 
institution can either be (a) governing jurisdictions 
such as countries, states, counties, school districts, 

or other agencies at the same level; or (b) individual 
organizations such as hospitals, other health service 
providers, early child care providers, elementary or 
secondary schools, colleges or universities, graduate 
departments, businesses or corporations, or charities. 
Assessments are produced across many fields of 
public policy and governance, including health care, 
economic policy, the environment, and education (see 
Coe & Brunet [2006] for a review of some prominent 
reports, for example). 


In addition to applying to different kinds of 
organizations and fields, comparative assessments 
can take many different forms, characterized by 
both the technical quality of the measures involved 
and more qualitative aspects of their design that 
improve communicability and heighten impact 
(Gormley & Weimer, 1999; Stinchcombe, 2001). 
These forms include organizational report cards, 
“scorecards,,” benchmarking, rankings, ratings, or 
some combination of these (Coburn & Turner, 2012; 
Gormley & Weimer, 1999; Kaplan & Miyake, 2010; 
Matthews, 1998). Reports may be published as a 
single compendium document covering all assessed 
organizations (particularly when the number of 
organizations is small, as when states are given 


RTI Press Publication No. OP-0046-1709. Research Triangle Park, NC: RTI Press. 


RTI Press: Occasional Paper 


“report cards”) or as individual documents per 
organization—both types of which are increasingly 
made available online in fixed or interactive formats. 


A key distinction among different types of 
comparative organizational assessments is between 
those which provide summative ratings (a score, 
grade, or rank) and those which provide multiple 
comparative metrics without an overall rating. 
Scorecards (including the “balanced scorecard” 
approach of Kaplan and Miyake [2010]) and 

many states’ school report cards initially created 

in response to the requirements of NCLB fall 

in the latter category. For example, California's 
School Accountability Report Cards (SARCs) and 
California School Dashboard provide information 
about school enrollment, demographics, student 
performance on state tests, annual yearly progress 
(AYP) determinations, graduation rates, Title I status, 
school staff data, expenditures, and a variety of other 
information, including facilities information and 
physical fitness test results—without providing any 
single grade or rating of the school’s performance 
across measures. In contrast, summative rating 
systems yield an easily understood overall assessment 
of the organization's quality or performance. Because 
of the reduction of an organization's processes and 
outcomes to a single rating or grade, summative 
ratings are highly visible outcomes that can exert a 
strong influence on public perceptions, organizational 
goals, and individual and corporate practice 
(Jacobsen, Snyder, & Saultz, 2014). Although report 
cards or scorecards that do not report a summary 
measure can be useful for internal management 

and organizational improvement and may be better 
avenues for strategic planning and administrative 
purposes, their potential to have a galvanizing impact 
on consumer behavior and public policy is much 
lower (Gormley & Weimer, 1999). 


School ratings are thus an individual, summative type 
of comparative organizational assessment applied 

to public and private schools at all levels. These 

rating systems use various combinations of student 
performance, student population, organizational 
resources, and other factors to create an overall 
summary grade, score, ranking, or rating that 
explicitly evaluates individual schools or districts 
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relative to others within a jurisdiction or group of 
comparable organizations. The oldest, most well- 
known and well-studied of such systems are US News 
& World Report’s (US News) annual Best Colleges 
rankings, which use surveys of college and university 
administrators and publicly available information 
from the federal government to rank institutes 

of higher education within peer groups (such as 
National Universities and Regional Colleges). A 
variety of other higher education rankings and ratings 
(such as rankings of graduate programs or schools) 
have also been produced by other organizations and 
by US News itself. At the elementary and secondary 
levels, there are multiple ratings of public schools 
produced by states and organizations such as 
Greatschools.org, US News, Newsweek, and Niche. 
com, in addition to state-level report cards such as 
Education Week’s Quality Counts and the federal 
government's own ranking of state performance on 
the National Assessment of Educational Progress 
(self-labeled as, in fact, “the Nation’s Report Card”). 


Comparative organizational assessments serve 
different purposes depending on the reporting 
organization and the nature of the assessment. 

The “balanced scorecard” approach, for example, 

is designed not only as a mechanism for tracking 
progress and ensuring accountability but is also 
intended to align with strategic goals of specific 
organizations (Kaplan & Miyake, 2010; Muller, 
2015); indeed, they are primarily for internal use, 

not for external evaluation and comparison. Coe and 
Brunet (2006) argue that different types of report 
issuers (governments, commercial enterprises, 
academics, foundations, and public interest groups) 
strongly shape the goals and design of assessments. 
Public interest groups in the environmental field, for 
example, sometimes pursue a strategy of dramatizing 
failure to heighten alarm about a particular issue 

and drive legislative and policy agendas. Although 
such manipulation is not prevalent in the field of 
education, it is important to consider the different 
purposes of organizations such as Greatschools. 

org (a nonprofit that licenses its ratings to other 
organizations, such as real estate website Zillow.com), 
US News (a journalistic publication and website), and 
individual states (which possess different resources, 
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challenges, and policy priorities) in interpreting their 
ratings. 


Taken together, the target field and institution, 

the technical and qualitative design features of the 
assessments—specifically whether a summative rating 
is provided—and the issuer's goals and resources all 
influence the nature of the comparative assessment 
and its usefulness to consumers, policymakers, and 
organizational leaders. With this in mind, it is helpful 
to turn toward some of what research has uncovered 
about the genesis of and reaction to comparative 
organizational assessments in education. 


Impacts of School Ratings, Accountability, 
and Performance Metrics on School 
Practices 


In addition to the literature describing the ideal 

uses and construction of organizational assessments 
(Gormley & Weimer, 1999; Moynihan, 2008), 

a burgeoning research literature has sought to 
understand the impact of accountability reporting 
on educators and educational institutions. In general, 
this research draws from theoretical perspectives 

in organizational theory, particularly the new 
institutionalism of sociology, which emphasizes 

how such formal codes and reported measures 
(everything from national flags to economic statistics, 
and including educational assessments) promote 
uniformity in outward appearance (isomorphism) 
but coexist with local, contextualized practices 

that are resistant to external pressure (DiMaggio 

& Powell, 1983; Meyer & Rowan, 1977). In other 
words, institutionalism as applied in organizational 
theory has observed a decoupling between the formal 
structures organizations claim adherence to and 

the actual routines and beliefs of the organization 
and its employees. This allows organizations to gain 
legitimacy in the eyes of the public or principal 
constituencies as members in good standing 

while minimizing disruption to preferred modes 

of business. According to this perspective, the 
development of organizational metrics, report cards, 
and ratings would yield superficial conformity to 
overall institutional goals (as expressed in the chosen 
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measures), but not necessarily change attitudes or 
behaviors within local contexts. 


However, research into the impact of comparative 
organizational assessments within education 
challenges this narrative. For example, Espeland 

and Sauder (2007; Sauder & Espeland, 2009) found 
that law school administrators felt both external 

and internal, self-imposed pressure to align their 
organizations with the measures promoted by US 
News’ rankings of best law programs. Such effects 
can be positive or negative; numerous studies have 
found that the emphasis on accountability and testing 
codified by NCLB within elementary and secondary 
schools has led to negative effects such as ignoring 
low-achieving students, reducing arts and other kinds 
of enrichment instruction, encouraging low-achievers 
to skip the tests, and even criminal cheating practices 
(Booher-Jennings, 2005; Colyvas, 2012; Diamond & 
Cooper, 2007; Heilig & Darling-Hammond, 2008). 
Both “gaming” ratings systems and significant 
changes in teaching emphasis have been observed; the 
former could have been predicted by institutionalism, 
while the latter indicates “recoupling,” or the 
alignment between individual organizational 
procedures and external institutional forms. 


Likewise, investigations into the use of data within 
schools has suggested that educators react in a variety 
of ways to accountability policies and performance 
reporting. Teachers can experience both cultural and 
technical barriers to data use that limit the impact of 
performance reporting on actual practice (Ingram, 
Louis, & Schroeder, 2004) as well as come to adopt 
the conceptual framework of performance-based 
accountability in their thinking about and approach 
to classroom instruction and student development 
(Spillane, Parise, & Sherer, 2011). More ominously, 
in adapting themselves to the pressures of external 
accountability, schools have been found to “game the 
system” of accountability to, for example, exclude 
likely low-performing students (Heilig & Darling- 
Hammond, 2008) or even change test answers 

and scores of students (Wilson, Bowers, & Hyde, 
2011). The important point is that organizational 
assessments, and school performance reports in 
particular, seem to exhibit “tight coupling” with the 
actual activities of practitioners. Both resistance to 
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and embrace of data tools and accountability thinking 
can lead to shifts in organizational routines (Spillane, 
2012). 


This conclusion implies that rating systems 

must be understood in terms that go beyond the 
statistical or technical aspects of their construction. 
Qualities outlined by Gormley and Weimer 

(1999) and Stinchcombe (2001)—such as validity, 
comprehensibility, and the intent and capacity of 
issuing organizations—play a role in the reception 
of ratings and the reaction of practitioners. We must 
apply conscientious attention to the construction, 
use, and misuse of ratings systems if they are to have 
a constructive impact in school improvement. To 
further this goal, the paper next turns to a closer 
examination of state and consumer-oriented high 
school rating systems and then the key technical 
qualities that affect their use and interpretation. 


A Review of State and Consumer-Oriented 
School Rating Systems 


Multiple organizations now release school ratings. 
There are two primary types of issuers: (1) state 
departments of education or public instruction, 
which report school ratings relative to other 

schools within their state (Martin et al., 2016); and 
(2) consumer-oriented enterprises which publish 
ratings or rankings for schools across the country. 
Consumer-oriented enterprises include the nonprofit 
Greatschools.org (which is supported by advertising 
and licensing revenue as well as foundations and 
grants) and journalistic organizations including 

US News, The Washington Post, and Newsweek, and 
commercial websites such as Niche.com. Also, a 
variety of award programs have produced assessments 
of schools or districts that meet some of the criteria 
for a rating system, except a summative rating. Such 
efforts include the Department of Education’s Blue 
Ribbon Schools Program awards, which awards 
distinctions to public and private high schools based 
on academic performance and gap reduction; the 
Broad Prize for Public Education and the Broad Prize 
for Charter Management Organizations, which until 
recently presented awards to public school districts 
or charter organizations, respectively, demonstrating 
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high overall performance and reduced achievement 
gaps; and the State Collaborative on Reforming 
Education (SCORE) Prize, which recognizes high- 
performing schools and districts in Tennessee. These 
awards rely on a mix of quantitative analysis and 
qualitative investigation but do not necessarily report 
publicly on their internal analyses. 


State Ratings Systems 


All states publish school report cards on their public 
elementary and secondary schools; however, not 

all states have implemented a rating or grade that 
summarizes the school’s overall performance (not 
including accountability categories or formerly 
required NCLB designations of “meeting” or “not 
meeting” AYP). States will be required to add these 
summative ratings under ESSA. Indeed, several states 
are currently revisiting their accountability systems in 
light of ESSA requirements and regulations. 


Although they predate NCLB in some states, 

most states’ school report cards, scorecards, and 
performance reports arose in response to NCLB 
requirements that school performance be publicly 
reported. In 2011, recognizing the flaws in NCLB's 
requirements that all students be proficient by 

2014, the US Department of Education initiated a 
waiver program that allowed states freedom from 
some NCLB requirements in exchange for a plan 

to implement new systems of college and career 
readiness assessment, accountability, teacher and 
principal evaluations, and low-burden administrative 
reporting (US Department of Education, 2015). Many 
states took this opportunity to revamp or expand 
their school ratings systems. As of the fall of 2015, 

44 states! had requested and received approval for 
NCLB flexibility, with another two requests (lowa and 
Wyoming) under review at the time. Only California, 
Montana, Nebraska, North Dakota, and Vermont did 
not request waivers. 


With the passage of ESSA in December 2015, federal 
requirements for accountability changed. States 
are now required to report information beyond 


1 Count includes Washington, DC, but not Puerto Rico or the Bureau of 
Indian Education. 
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assessment results; disaggregate data by additional 
student subgroups; place “substantial weight” on 
certain indicators; and report a summative rating. The 
indicators ESSA requires include test performance 

in English language arts (ELA) and mathematics; 

a second academic indicator such as achievement 
growth; progress toward achieving English language 
proficiency; one or more school quality (e.g., school 
climate) or student success measures (i.e., college and 
career readiness measures); and, for high schools, 
graduation rates. These requirements are to take effect 
for the 2017-2018 school year. 


Based on a review of state education websites, 
accountability documents, school-level report 
cards, and NCLB waiver requests, Table 1 presents 
information (as of June 2017) on the use of 
summative school ratings in each of the states, the 
components of the summary ratings, and the name 
the state has given the core rating. The components 
listed are given generic names that cover a variety of 
specific measures that states may use. For example, 
“achievement” can include multiple subject areas 
(although usually just English and mathematics); 
“progress” refers to both student-level achievement 
growth models and year-over-year increases in 
proficiency among aggregate student groups; and 
“college and career readiness” includes a variety 

of measures such as performance on Advanced 
Placement (AP) or International Baccalaureate (IB) 
courses, SAT or ACT test scores, and matriculation 
rates at postsecondary institutions. The listing of 
components includes elements specific to high 
schools, such as graduation rates and college and 
career readiness indicators; elementary and middle 
school ratings in most states are identical except for 
the inclusion of these metrics. 


Fifteen states provide no summative ratings for 
their schools or do not have published plans for a 
summative rating. These states do report a variety of 
profile and outcome measures at the school level— 
and in some cases, like Texas, report summative 
ratings for certain dimensions like achievement or 
growth—but they have not taken the additional 

step of constructing an overall summary rating 

for schools. In other cases, individual districts or 
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Table 1. Summary of school rating systems in use or planned by states 


State 


Alabama 
Alaska 
Arizona 


Arkansas 


California 


Colorado 
Connecticut 
Delaware 

District of Columbia 
Florida 


Georgia 


Hawaii! 
Idaho 
Illinois 
Indiana 
lowa 
Kansas 
Kentucky 
Louisiana 
Maine 
Maryland! 
Massachusetts 


Michigan 
Minnesota2 


Mississippi 


Summary rating 
A-F 


Stars (1-5) 
A-F 


A-F 

None 
0%-100% with 
classification 


0-100 with 
classification 


0-500 


0-100+ with 
classification 


A-F 


A-F 


0-400 with 
classification 


None 


0-300 with 
classification 
(forthcoming) 


A-F 


0-100 with 
classification 


None 


Classification 
based on matrix 
(forthcoming) 


A-F 
A-F 


Strands (1-5) based 
on 0-2+ scale 


Levels (5-1) with 
labels 


Colors (5) 
0%-100% 
A-F 


Components of summary rating 
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Name 


achievement, growth, gaps, graduation, attendance, and School Grades 


college and career readiness 


achievement, growth, gaps, graduation, attendance, and 


college and career readiness 


achievement, growth, graduation, and dropout 


achievement, growth, gaps, graduation 


achievement, growth, gaps, graduation, dropout, and 
postsecondary and workforce readiness 


achievement, growth, graduation, college and career 
readiness, absenteeism, physical education, arts access 


achievement, growth, graduation, and college and 


career readiness 


achievement, growth, graduation, attendance, and test 


participation 


achievement, growth, college and career readiness, and 


graduation rate 


achievement, growth, gaps, and bonus measures 


achievement, growth, gaps, and college and career 


readiness 


achievement, growth, progress, graduation, bonus 


measures, and test participation 


achievement, growth, graduation, and college and 


career readiness 


achievement, growth, gaps, graduation, and staff 


retention 


achievement, gap, graduation, college and career 


readiness, and opportunity to learn 


achievement, growth graduation, college readiness, and 


bonus measures 


achievement, growth, and graduation 


achievement, growth, gaps, and college readiness 


achievement, growth, gaps, graduation, dropout, college 
and career readiness, and bonus measures 


achievement, progress, gaps, and graduation rate 
achievement, growth, gaps, and graduation 


achievement, growth, graduation, and college and 


career readiness 
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Alaska School Performance 
Index 


A-F Letter Grade Accountability 
System 


Arkansas School Grading System 


School Performance Framework 
Next Generation Accountability 


System 


Delaware School Success 
Framework 
School Index Score 


School Grades 


College and Career 
Readiness Performance Index 
(CCRPI) 


Strive HI School Performance 
Report 


Multiple Measures Index 


PL221 (Public Law 221) Grades 


Performance Index 


To be determined 


School Performance Score 
Maine School Performance 
Grading System 

School Progress Index 


Progress and Performance Index 


Top-to-Bottom ranking 
Multiple Measures Rating 
Accountability Grades 


(continued) 
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Table 1. Summary of school rating systems in use or planned by states (continued) 


State 


Missouri 


Montana 
Nebraska 


Nevada! 


New Hampshire 
New Jersey 


New Mexico 


New York 
North Carolina 
North Dakota 
Ohio 


Oklahoma 


Oregon 


Pennsylvania 


Rhode Island 


South Carolinal3 


South Dakota 


Tennessee 
Texas 
Utah 


Vermont 
Virginia 
Washington 
West Virginia 


Wisconsin 


Wyoming 


Summary rating 
0-100 


None 
None 
Stars (1-5) 


None 
None 
A-F 


None 

A-F 

None 

A-F (forthcoming) 


A-F 


None 


0-100+ with colored 
icons 


20-100 with 
classification 


A-F 


0-100 with 
classification 


None 
None 
A-F 


None 
None 
Levels (1-10) 


A-F 
0-100 with 


classification 


None 


Components of summary rating 


achievement, graduation, college and career readiness, 


and attendance 


achievement, growth, gaps, graduation, college and 
career readiness, and attendance 


achievement, growth, graduation, college and career 
readiness, opportunity to learn, and bonus measures 


achievement and growth 


achievement, growth, gaps, graduation, and college and 


career readiness 


achievement, growth, and bonus measures of 


graduation, college and career readiness, and attendance 


achievement, growth, gaps, graduation, attendance, and 


bonus measures of college and career readiness 


achievement, gaps, progress, and graduation 


achievement, progress, graduation, and test 
participation 


achievement, growth, graduation, and college and 
career readiness 


achievement, growth, graduation, college and career 
readiness, and test participation 


achievement, growth, graduation, and college and 
career readiness 


achievement, growth, graduation, college and career 
readiness, and attendance 


achievement, growth, gaps, graduation, college and 
career readiness, attendance, and test participation 


Name 


Annual Performance Report 
(APR) Score 


Nevada School Performance 
Framework 


School Grading 


School Performance Grades 


Ohio School Report Cards 


A-F School Grading System 


Building Level Academic Score 


Composite Index Score 


School Performance Index 


School Grade 


Washington Achievement Index 


West Virginia School 
Accountability System 


Overall Accountability Score 


1 For school year 2015-2016 (the most recent year of reporting), Hawaii, Maryland, Nevada, Oregon, and South Carolina did not implement their rating system, 


pending revisions in response to ESSA. 


2 Minnesota planned to suspend their school rating system in July 2017, pending revisions in response to ESSA. Kentucky continued reporting their old ratings 
(a 0-100 index with classification) in July 2017, but plans a transition to a new system. 


3 South Carolina suspended its ratings after the 2015-2016 school year. 
Note: Components include measures that only apply to high schools (graduation; college and career readiness and its variants). 
Sources: Elementary and Secondary Education Act (ESEA) Flexibility Waivers; State school report cards and accountability websites; Martin, Sargrad, & Batel (2016). 
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regional educational agencies may report their own 
grading system: although not systematically covered 
here, one example is the California Office to Reform 
Education’s (CORE) consortium of districts in 
northern California, which applied for and received 
its own NCLB waiver to implement a “School Quality 
Improvement Index” that uniquely incorporates 
measures of social-emotional skills as outcomes. 
Similar other educational groups that are specific to 
states may report their own summative rating: for 
example, the Connecticut Coalition for Achievement 
Now (ConnCAN) assigns A through F letter grades to 
Connecticut schools and its web pages are prominent 
in search results, even though Connecticut's state 
system provides a 0-100 index rating. 


The remaining 36 states formally provide summative 
ratings or have plans to do so for all of its schools. The 
ratings the states provide can be divided into three 
groups: 

e A-F letter grades. The most common rating system 
is a letter grade rating familiar to parents and 
consumers. Florida was the first state to implement 
a letter grading system; currently, 16 states use letter 
grading systems or have plans to do so. A-F grading 
systems typically have underlying continuous values 
or scales to which letter grades are subsequently 
assigned. The underlying values may or may not be 
clearly explained in accountability documentation 
or presented along with the letter grade in public 
documentation. 


e Index scores or scales. Thirteen states use 
continuous scores or scales to summarize school 
performance. Most of these (10 states) range 
from 0 or 1 to 100 (or slightly more, depending 
on whether bonus points are awarded to schools 
for performance on additional measures). Three 
states use larger ranges of 0-300 (Illinois), 0-400 
(Hawaii), or 0-500 (Delaware). States that use index 
scores also typically have a classification system that 
translates the continuous score into a category that 
is used for improvement purposes. For example, 
Washington, DC, reports five categories based on 
its O-100+ School Index Scores: Reward, Rising, 
Developing, Focus, and Priority. Other states use 
more distinctive labels. Pennsylvania, for example, 
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has a unique system of colored icons associated 
with ranges of its 0-100+ scale (e.g., an upward- 
facing, hollow blue triangle is used with the highest 
range of scores). 


¢ Other ratings. Seven states use alternative summary 
ratings. Two states (Alaska and Nevada) use a five- 
star rating system. Three states use numerically 
defined levels: Maryland calls its five levels 
“strands,” based on an underlying z-score that 
ranges from 0 to approximately 2. Massachusetts 
uses levels 5 to 1, with level 1 being the highest, 
and color-codes each level from green (level 1) to 
red (level 5). Washington uses levels 1 (low) to 10 
(high). Kentucky uses a classification scheme based 
on a matrix of categories and resulting in categories 
such as “Intervention” and “Outstanding.” Michigan 
uses a five-color scheme as its summative rating 
(green, lime, yellow, orange, and red, with green as 
the highest level). 


These summative ratings may or may not be 
prominent on school report cards, or may be 
reported on web sites that are separate from other 
accountability indicators. For example, before 
suspending their summative rating for 2015-2016, 
South Carolina did not include their letter grade 
rating on school report cards (an effort to revamp 
South Carolina’ reporting system is outlined in 
Koon, Petscher, & Hughes, 2015). 


Despite differences in the labels attached to the 
ratings, the similarities across states are striking. 
Most states using a summative rating are using a 
five-level rating system, whether that is designated 
by stars, levels, or classes based on a continuous 
metric. In addition, the component indicators used 
to rate schools are very similar. By following NCLB 
flexibility request requirements and with the technical 
assistance of the US Department of Education, states 
prepared a variety of similar accountability plans that 
typically incorporated at least several of the following 
measures: overall achievement within in a school 
(the unadjusted or absolute level of performance), 
achievement growth among the same students over 
time, achievement progress of students or the school 
toward defined goals, achievement gaps between 
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key student subgroups, graduation rates, attendance 
rates, and college and career readiness results. Under 
ESSA, states have more flexibility to design their 
accountability systems, including incorporating 
additional measures beyond these core indicators. For 
example, Kentucky proposes to include opportunity 
measures relating to equitable access to gifted and 
talented programs and certified teachers (Kentucky 
Department of Education, 2017). 


Schools usually receive points for their performance 
on each measure, with their total score comprising 
a summation across measures. The score then 

either may total the final index (as in Hawaii) or be 
converted to a standard range or percentage (as in 
Colorado). The total possible points received for any 
given measure typically varies; some measures are 
given more weight in the total score by having more 
possible points awarded for performance in that 
domain. Some measures have multiple indicators, 
and receive points for each indicator contributing to 
the total points for that measure. For example, the 
measure of achievement usually includes separate 
components for reading and mathematics test 
performance. 


As a detailed example, Nevada's School Performance 
Framework (currently on hold pending revisions in 
response to ESSA) created a five-star rating system 
based on a 100-point scale involving achievement 
status (accounting for 20 percent of the scale and 
comprising the percentage of 10th graders meeting 
proficiency expectations and the cumulative 
percentage of 11th graders meeting proficiency 
expectations); school median growth percentile 

for 10th graders (10 percent); gap in 11th-grade 
proficiency for disadvantaged student groups (10 
percent) (each of the foregoing using both math and 
reading assessments); graduation rate (30 percent, 
including both an overall rate and an indicator for the 
gap in rates for disadvantaged student groups); college 
and career readiness (16 percent, using four different 
measures involving Advanced Placement proficiency, 
ACT and SAT test participation, advanced diploma 
rates, and Nevada college remediation rates); and two 
other measures (average daily attendance [10 percent] 
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and percentage of ninth graders who are credit 
deficient [4 percent]). Nevada's star rating system 
was thoroughly explained on the state Department of 
Education's website (http://nspf.doe.nv.gov/) and was 
featured prominently on individual school pages, but 
it was not featured on a separate set of school report 
card pages also operated by the Nevada Department 
of Education (http://www.nevadareportcard.com). 


Consumer-Oriented Rating Systems 


As noted, consumer-oriented ratings are produced by 
the nonprofit organization Greatschools.org as well 
as by journalistic publications and websites such as 
US News and Niche.com. Like state rating systems, 
Greatschools.org, Niche.com, and SchoolDigger.com 
rate elementary and middle schools as well as high 
schools; but US News, Newsweek, and The Washington 
Post only cover high schools. Unlike state rating 
systems, these organizations rate schools across states, 
and therefore must grapple with state tests that are 
not comparable across state lines and proficiency 
benchmarks that may be set at different levels across 
states. The typical solution is to provide some rating 
or measure that accounts for school performance 
relative to the school's state average. They may also 
incorporate comparable cross-state indicators such 

as the percent of graduating seniors who took or 
passed Advanced Placement courses. They are not 
able to take advantage of student-level data available 
in states’ unit-record longitudinal databases; instead, 
they rely on aggregate information at the school level. 


The Greatschools Rating 

The largest and most prominent independent school 
rating system is the nonprofit Greatschools.org, 
which reports its ratings on its own website as well as 
licenses its rating to other organizations (for example, 
it is featured on real estate website Zillow.com). 
Greatschools.org reports a 1-10 rating and labels 
scores 1-3 as below average, 4-7 as average, and 8-10 
as above average. In most states, ratings are based 
only on test performance relative to other schools 

in the state. Schools are rated on how well their 
students (in the aggregate) do on each test reported; 
then the individual test ratings are averaged into the 
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overall 1-10 rating. In 12 states, Greatschools.org 
has additional data on academic growth and college 
readiness (which they define as graduation rates and 
SAT or ACT performance). Schools in these states 
are also compared relative to other schools in the 
state, and then averaged with test performance into 
an overall rating—if only one additional indicator is 
available, then test performance is weighted at half 
the overall average, with the other indicator as the 
other half. If both additional indicators are available, 
then the three components are equally weighted as 
one-third of the total. 


US News & World Report's Best High Schools Rankings 


US News ranks a subset of high schools that meet 
certain criteria (US News & World Report, 2017). 
The ranking is based on an overall rating—a “college 
readiness index”—calculated from participation 
and passing rates on Advanced Placement (AP) and 
International Baccalaureate (IB) tests; this criterion- 
referenced measure allows cross-state comparisons 
and a ranking of the best high schools across the 
country and within individual states. However, only 
high schools that perform better than expected 
given their poverty level (using a regression analysis 
and predicted proficiency results) and better than 
their state’s average for disadvantaged subgroups are 
ranked on the college readiness index. This “gated” 
methodology excludes approximately two-thirds of 
high schools, consistent with the “best” focus of the 
rankings. For those that are ranked, US News also 
awards gold, silver, and bronze “medals” to further 
distinguish classes of high schools. 


Newsweek’s America’s Top High Schools 


Newsweek produces two ranked lists of top high 
schools in the United States: an “absolute” list based 
on unadjusted performance on state tests (relative 
to other schools within each state) and a “relative” 
list that takes into account the percentage of low- 
income students (Newsweek, 2015). As the first 

step in constructing each list, Newsweek also uses 

a “gated” method. For the absolute list, Newsweek 
uses a “threshold” analysis to identify the top 20 
percent of high schools in each state. For the relative 
list, it identifies those schools that perform half of 

a standard deviation above the state average after 


RTI Press Publication No. OP-0046-1709. Research Triangle Park, NC: RTI Press. 


RTI Press: Occasional Paper 


controlling for poverty. Once these schools are 
identified, Newsweek uses the results of its own high 
school survey to construct a college readiness score 
based on enrollment rate (accounting for 25 percent 
of the total score); graduation rate (20 percent); 
weighted AP/IB composite (17.5 percent); weighted 
SAT/ACT composite (17.5 percent); holding power, 
which is the change in enrollment between 9th 

and 12th grades (10 percent—this last component 
comes from the Common Core of Data, not 
Newsweek’ survey); and counselor-to-student 

ratio (10 percent). In this regard, their ranking is 
similar to state composite indices. However, schools 
without sufficient data from the survey, or who are 
nonrespondents, are not included. In 2015—the 
most recent year for which a methodology report 
was released—Newsweek reported that their survey 
response rate was 34 percent for schools on the 
relative list and 42 percent for schools on the absolute 
list. 


Other Rankings 


Given the widespread availability of data available 
from the states and from the federal government, 
there is no shortage of school ratings or rankings 
available to the public. Others worth mentioning 
include The Washington Post’s Most Challenging 
High Schools index, which ranks schools on a 
“Challenge Index,” the ratio between the number of 
AP, IB, and Advanced International Certificate in 
Education (ICE) tests taken in a given school year 
and the number of seniors who graduate in May or 
June (Matthews, 1998). Niche.com also offers high 
school ratings, and incorporates multiple measures 
drawn from federal data as well data from Niche 
users (students, alumni, and parents) who provide 
feedback at their website. Their letter-grade rating 
(A-D) incorporates measures of academics, health 
and safety, student culture and diversity, teachers, 
resources and facilities, clubs and activities, sports 
and fitness, and opinion items from survey responses. 
Each measure has its own subcomponents and 
variable weighting. The underlying scale is a z-score 
created from the z-scores of each individual measure. 
SchoolDigger.com offers ratings of elementary 

and secondary schools based on normalizing and 
averaging schools’ state-reported test performance, 
similar to Greatschools.org. 
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Issues in the Design of School Ratings 
Systems 


The overview of state and consumer-oriented ratings 
systems presented above demonstrates a wide variety 
of methodologies and modes of presentation for high 
school ratings, despite underlying similarities in the 
reliance on a core set of metrics related to student 
test scores and a few other immediate “output” 
measures like graduation rates. The variety of ways 
in which ratings are presented reinforces the idea 
that presentation is as consequential as measures 
and methods. Indeed, in their work Organizational 
Report Cards, Gormley and Weimer (1999) argue 
that there are six key aspects of comparative 
assessments: validity, comprehensiveness, relevance, 
comprehensibility, reasonableness, and functionality. 
Tellingly, only the first feature relates to the technical 
qualities of the measures themselves; the remaining 
five features relate to how ratings systems effectively 
summarize and communicate useful information to 
or about the organizations being assessed. Likewise, 
Colyvas (2012) has criticized prior work concerning 
what she terms performance metrics for focusing 
too closely on the statistical properties of measures. 
Colyvas draws on Stinchcombe’s (2001) framework 
for analyzing formal systems to articulate features 
(several similar to Gormley and Weimer’s list) that 
have a significant bearing on the meaning and 

use of such measures. Reyna (2016) also recently 
discussed key features of school ratings systems in the 
environment of ESSA. 


To further understanding of the importance of 
nontechnical features in the design of school ratings 
systems, the following discussion elaborates on 

the broader measurement, methodological, and 
presentation choices made in constructing them. 
Understanding these choices can help advance 
knowledge of the limitations and strengths of 
school ratings in guiding school improvement and 
informing parents and the public about the schools 
their children attend. Drawing on and consolidating 
the aspects discussed at length by Gormley and 
Weimer (1999), Colyvas (2012), Stinchcombe (2001), 
and Reyna (2016), the discussion focuses on the 
implications of four design features of school rating 
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systems: (1) measurement, (2) transformation, 
(3) integration, and (4) presentation. 


Measurement 


The movement toward educational accountability 
has mainly stressed outcomes measures: primarily 
achievement on standardized assessments and, at 

the high school level, attainment (Martin et al., 

2016; Mikulecky & Christie, 2014). However, there 
are multiple types of data on school performance 
that could be measured and which could provide 
useful information to practitioners and families. 
Gormley and Weimer (1999), for example, describe 
inputs, processes, outputs, and outcomes as 

elements of organizational performance that can 

be measured and reported on. Their “outputs” are 
what are often labeled “outcomes” in the educational 
field—immediate or short-term results such as test 
performance. “Outcomes,” in their terminology, refer 
to longer-term results that reflect the ultimate mission 
of an organization, such as successfully enabling a 
transition to college and providing the basis for good 
work and family habits. Although “college and career 
readiness” outcomes are often included in state rating 
systems (Martin et al., 2016), they are typically only 
measured through coursetaking or college admissions 
test results that are, in fact, immediate or short- 

term results, not explicit evidence of college or work 
success (Mikulecky & Christie, 2014). 


In addition to outputs and outcomes, inputs are also 
an important aspect of organizational performance, 
reflecting organizational competencies in securing 
resources to meet needs. These aspects of school 
performance are not regularly reported on school 
report cards nor included in accountability systems, 
but may be reported independently in financial 

and budgetary documents. ‘There is certainly 

far less regular, systematic reporting of school- 

level organizational resources (e.g., number, 

age, and diversity of textbooks; amount, quality, 
and appropriateness of high school laboratory 
equipment), and very little attention has been paid 
to how to achieve consistency in reporting on and 
informing the public about school resources. 
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Likewise, process measures have received very scant 
attention within school ratings systems, yet are clearly 
the critical component in what makes schools work 
for students and their families. Being among the most 
difficult aspects of organizations to measure, and 
whose complexity is most in danger of being reduced 
to unfairly simplistic metrics, it is understandable that 
measures of process—such as teacher professional 
development, instructional improvement plans, and 
discipline procedures—are rarely incorporated into 
school rating systems. An exception is Kentucky’s 
“Unbridled Learning” system, which incorporates 
program reviews and professional ratings into their 
accountability system, if not into their school ratings. 


To be fair, accountability policies never intended to 
encompass inputs and processes—these were in many 
ways assumed to adjust to meet accountability goals 
(Hoffer, 2000)—but the expansion of accountability 
reporting into full-scale school rating systems 
incorporating many measures, and tied more 

strongly to purposeful systems of intervention 

and improvement, implies that school leaders and 
policymakers need access to higher-quality data 

on more than just short-term outputs or even 
long-term outcomes. Standardized school-level 
summaries of instructional materials and technology; 
facilities; teacher and leadership evaluations; course 
assignment and access policies; and discipline and 
safety procedures and practices—among many 

other possible options—deserve their place in rating 
systems that purport to provide an overview of school 
quality and performance. 


Transformation 


The selection of domains to measure and specific 
indicators to use leads to a second step involving the 
transformation of data to make measures amenable 
for analysis and ready for inclusion in a summary 
score or rating. Depending on the issuer of a school 
rating, measures may be included that reflect absolute 
or unadjusted performance (for example, overall 
achievement results) or conditioned measures that 
account for the “inputs” involved in their production 
(such as predicted achievement levels, academic 
growth measures at the individual level, or change 

in academic achievement at the school or student 
subgroup levels). 
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Indeed, Gormley and Weimer place a strong 
emphasis on the necessity of making appropriate “risk 
adjustments”—that is, controlling for differences in 
populations served and prior histories. Such statistical 
adjustments are critical to ensuring the validity 

of organizational performance metrics, which, 
according to Gormley and Weimer, should measure 
the contributions of the organization itself and not 
the educational or socioeconomic backgrounds 

of students. This includes, in the school context, 
removing the influence of peers and the composition 
of the study body, long shown to have an effect on 
individual student achievement (Wilkinson, 2002; 
Sacerdote, 2014). Statistical controls can be relevant 
for cross-school comparisons by addressing factors 
outside of the school’s control that are known to affect 
student performance, or for same-school comparisons 
over time by accounting for prior performance level 
and changing student composition across cohorts. 


However, while Gormley and Weimer note that 
validity must be balanced by comprehensibility, their 
emphasis on modeling of educational outcomes to 
abstract out organizational performance ignores the 
function that peer groups or student composition 
might play in some interpretations of school ratings. 
That is, risk adjustment may be seen by some 

ratings consumers as an unnecessary confusion or 
distraction. Unadjusted performance could be very 
meaningful and valid to parents who believe in the 
power of peers and the importance of school climate 
for fostering good educational habits. Even those who 
speak out from concern about subgroup performance 
may decide that adjustments hide performance 
problems and result in setting low expectations for 
underperforming groups (Gormley & Weimer, 1999, 
p. 77). 


Fortunately, most school ratings systems combine 
some measure of unadjusted achievement with an 
adjusted measure of growth or progress to both 
provide credit to schools for their performance as an 
organization given resource constraints or population 
deficits and to acknowledge the public’s concern with 
overall performance and high standards. Indeed, the 
inclusion of performance gaps in most school rating 
systems indicates that the tide has shifted away from 
earlier performance reporting thinking (as illustrated 
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by Gormley & Weimer, 1999) that schools should be 
judged relative to the challenges they face toward a 
system in which schools are rewarded for across-the- 
board (absolute) levels of achievement. 


The other principal data transformation that has an 
important consequence on school ratings results 

is whether the data are preserved as a criterion- 
referenced measure (e.g., a graduation rate measured 
as 0 to 100) or are standardized into relative metrics 
of percentiles, z-scores, or relative ranks. This 
dimension of relativity is treated differently in state 
school rating systems and consumer-oriented rating 
systems. In the former, schools are typically given 
credit for the extent to which they meet external 
benchmarks—e.g., percent proficient on math and 
reading assessments, or graduation rate—and all 
schools can theoretically attain the highest possible 
rating. In the latter, consumer-oriented ratings, 
including Greatschools.org and US News rankings, 
schools are given a relative rating, inherently limiting 
the number of schools that can receive top ratings. 
In Greatschools.org’s case, schools are rated relative 
to others in the state on their final letter grade rating. 
In the US News rankings, schools are given a final 
ranking nationally (and by state) but are also given 

a criterion-referenced score on its “college readiness 
index,’ on which all schools can (mathematically) 
achieve a top score. 


The different purposes of state and consumer- 
oriented ratings partially account for differences in 
the use of relative metrics. State ratings are designed 
for accountability and improvement purposes; 
consumer publications are designed to clearly 
differentiate schools to help guide relocation or 
school choice decisions and drive media attention. 
However, reward or incentive programs (such as 
South Carolina’ school incentive reward program; 
Gormley & Weimer, 1999, pp. 77-80) conducted by 
states, districts, or independent agencies (such as the 
Broad Foundation) also often use relative metrics 

as a way to create distinctions for the purposes of 
identifying single winners or clear dividing lines for 
groups of awardees. 
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Integration 


Another significant design feature of school rating 
systems is the manner in which multiple measures 
are jointly used or integrated. At the broadest 

level is the question of whether multiple measures 
are integrated within a single unidimensional 

rating or if measures are kept separate as part 

of a multidimensional evaluation of quality and 
performance. Multidimensional rating systems 
include approaches such as the balanced scorecard 
or any of the currently 14 state accountability 
systems that provide no unitary score or grade. 
Multidimensional rating systems are better suited for 
internal purposes of evaluation and improvement, 

as they demand greater knowledge of individual 
measures or domains of measurement; they are less 
accessible to outside observers such as parents or state 
or federal policymakers (see Gormley and Weimer’s 
discussion of “comprehensibility”). 


Even if the school rating system is holistic in the sense 
that it incorporates measures across multiple domains 
and employs some element of statistical adjustment 
for inputs, if the final result is a score, grade, or 

rank, the rating is a unidimensional evaluation 
metric. Among unidimensional ratings, there is the 
aforementioned distinction between gated evaluations 
(e.g., US News’s Best High Schools rankings) and 
composite scoring (used by most states). Gated 
evaluations are ideal for systems whose purpose is to 
identify top-performing schools for the purposes of 
distinction or reward. Composite scoring serves the 
need for comprehensive performance evaluations of 
all schools for accountability and improvement. 


There also remains the question of the construction 
of composite scoring systems, which can follow 
several different approaches. The construction 

most commonly employed by state rating systems 

and some consumer-oriented publications 
(Greatschools.org and Niche.com) are weighted 
schemes, in which each measure contributes a varying 
proportion of a final index or score. Weighting has 
strong advantages because it allows multiple measures 
to contribute to a rating (in contrast to the “all or 
nothing” results of NCLB’s AYP provisions) while 
emphasizing measures deemed most significant, but 
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there are few standard mechanisms for setting relative 
weights other than qualitative processes of expert 
review, public comment, and political input (Reyna, 
2016). For example, ESSA stipulates only that states 
are required to give “substantial weight” to a set of 
academic indicators in designing their accountability 
systems, and that these academic indicators receive 
more weight than school quality or student success 
measures (Martin et al., 2016). Indeed, the lack of 
clear procedures for determining weights means 

that some ratings systems are subject to ongoing 
debate that may be influenced by political or other 
noneducational concerns (e.g., as in North Carolina— 
see Antoszyk, 2017). 


There are also systems in which measures could 
contribute to placement in a categorical level or 
rating based on decision rules tied to multiple 
measures instead of a single continuous score or 
index. The special case here is Texas's Performance 
Index system, which although not providing an 
overall continuous summary score, computes index 
values for four domains (achievement, progress, 
gaps, and postsecondary readiness) and then 

assigns schools a “met standard” or “improvement 
required” classification based on surpassing target 
scores on each dimension. An alternative method 
could combine weighted composites with categorical 
label adjustment by adjusting a categorical rating 
determined by a continuous score upwards or 
downwards depending on additional measures, such 
as level of test participation (Reyna, 2016). 


Presentation 


Presentation is one of the key aspects of rating 
systems and performance metrics that both 
Stinchcombe (2001) and Gormley and Weimer 
(1999) stress. As an aspect of comprehensibility (in 
Gormley and Weimer’s term) or communicability 
(in Stinchcombe’s term), presentation encompasses 
both the naming or labeling of the rating and any 
associated categories as well as the manner in which 
it is disseminated. Prior research has shown that 
parents and the public respond to the presentation 
of performance data and the ways that performance 
data are summarized. Hastings and Weinstein (2008) 
showed that parents were more likely to select higher- 
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performing schools when given performance data. 
Jacobsen and colleagues (2014, p. 17) found that letter 
grades have a strong influence on public perceptions 
of school performance: 


For the strong school, the letter grade format leads 
to strongly positive views of school performance, 
while other formats lead to positive but more 
mediocre views of school performance. While 

we may have expected the numerical formats 

to be more influential due to cultural notions 
about numbers being accurate and precise...our 
consistent and strong results for the letter grade 
condition suggest that other cultural measures can 
have a much stronger influence. It appears people 
believe an A indicates very high performance and 
a C indicates quite low performance, while the 
numbers are more ambiguous. 


Similarly, school rankings—which, unlike ratings 
such as A-F grades, explicitly sort schools into higher 
or lower positions—may suggest larger differences 
between schools than the underlying measure 
actually indicates. US News, for example, uses a series 
of tiebreakers to account for the fact that many of the 
highest-ranked schools earn the maximum college 
readiness index score of 100. Further, the actual 
distinction between schools separated by the same 
number of positions (i.e., rank 10 and rank 20, versus 
rank 100 and rank 110) may be variable. Although 
ratings are not immune to misinterpretation as to the 
actual underlying performance of schools—indeed, 
scores, grades, or other categories of performance 
have an implicit hierarchy—rankings reify 
performance distinctions that may not be meaningful 
or clear-cut. 


Likewise, the manner of promotion or dissemination 
is signally important; ratings that are not visible 

or easily accessible will receive scant attention. 

For example, some states do not currently include 
available ratings on their school report cards (such 
as Rhode Island’s Composite Index); or they may 
provide ratings through separate websites or reports 
(such as Hawaii’s rating, which is only presented on 
its more detailed “School Status and Improvement 
Reports”). Even within states that include summary 
ratings, their prominence can differ sharply— 
Oklahoma's and New Mexico’ letter grades, for 
example, are featured in large font on the top of the 
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first page of their school report cards. The chosen 
presentation can deemphasize or promote the school 
rating and profoundly affect how students, parents, 
educators, and the public think about their local 
schools and the public school system overall. 


Table 2 presents a summary of some of these key 
features for the major high-profile high school 
rating systems. For comparison, the required ESSA 
accountability system, the historical NCLB system, 
and the prize funds rating systems (e.g., the Broad 
Foundation and the Tennessee SCORE Prize) 

are also included. NCLB instituted a system of 
school-adjusted accountability designations based 
on meeting benchmarks, primarily on a single 
dimension of test performance (but also including 
graduation rate at the high school level; because of 
the emphasis on test scores and to distinguish NCLB 
more clearly from subsequent state accountability 
systems under granted NCLB waivers, NCLB is 
labeled a unidimensional system). The prize funds 
also do not typically create a single composite rating 
or score, but rank schools or districts internally on a 
variety of metrics to identify potential award winners 
and vet finalists. 


These design features describe methodological 
choices that must be made in creating ratings for 
schools. Of course, these choices are not the only 
ones, nor necessarily the most consequential ones, 
involved in designing school ratings or scores. For 
example, the first and foremost process is generating 
individual measures themselves, whether drawn 
from existing databases or created during a survey 
or other process of data collection. These processes 
are not unbiased or error-free, and any adequate 
rating system must consider the role and impact of 
known or suspected problems with the validity and 
reliability of measures to be included. A second and 
vitally important set of choices informs how the 
rating system feeds back into decision-makers’ and 
educators’ actions—the “reactive” process discussed 
by Espeland and Sauder (2007). Depending on the 
purpose of the ratings, the issuer, the nature of its 
design, and how ratings are used by stakeholders, 
systems must take into account how practices may be 
amplified or distorted through the highlighting effect 
of the ratings and their measures. These and other 
processes are not design choices, but can be equally or 
more important factors in the development, use, and 
effectiveness of school ratings systems. 


Table 2. Design features of school ratings systems 


System/issuer Type of rating 


Adjustments 


Rating 


Relativity methodology 


Dimensionality 


NCLB requirements Accountability category School-level Criterion- Unidimensional Benchmarking 
referenced 
States with NCLB waivers 
Without a summary Accountability category Student-level —Criterion- Multidimensional Benchmarking 
rating referenced 
With asummary rating | Composite rating plus Student-level —Criterion- Multidimensional Variable weighting 
accountability category referenced 
ESSA requirements Composite score plus Student-level — Criterion- and/or Multidimensional Variable weighting 


accountability category 


norm-referenced 


Greatschools.org! Composite score None Norm-referenced Unidimensional Summary scale 
Niche.com Composite score None Norm-referenced Multidimensional Variable weighting 
US News & World Report Ranking plus index School-level Norm-referenced Multidimensional Gated . 
Newsweek Ranking plus index School-level Norm-referenced Multidimensional Gated 

Prize funds2 Award School-level Norm-referenced Multidimensional Gated 


Notes: ESSA = Every Student Succeeds Act; NCLB = No Child Left Behind. 
1 Refers to main Greatschools.org rating, not ratings based on additional measures for certain states that have provided Greatschools data beyond achievement data. 


2 Includes Tennessee SCORE Prize and Broad Foundation Prizes. 
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Conclusion and Implications 


This article has reviewed state and consumer-oriented 
rating systems, situating them in the broader context 
of comparative organizational assessments and the 
research literature surrounding their use and impact. 
It has also discussed key design features and their 
implications for use and interpretation by educators, 
families, and policymakers. 


Although school ratings do not necessarily inform 
parents about the quality of the instruction a child 

is likely to receive from individual teachers and 
although ratings are not pure representations of 
organizational performance given schools’ student 
populations and resource constraints, they can 
provide useful information for guiding improvement 
and helping families make schooling choices. School 
rating systems are influential with school and 
district administrators, who pay close attention to 
public perception, and with parents, who use them 
as a guide for residential location and enrollment 
decisions. Because of school rankings’ visibility 

and influence on policymaking and practice, their 
content, design, and presentation are important for 
educators and users to understand. 


However, more research needs to be conducted on 
the construction and validity of school rating systems. 
As relatively new measures in the organizational 
assessments landscape, school ratings—particularly 
those supplied by states—do not have a large 
literature evaluating whether their ratings or 

rankings correlate with desired student outcomes 

or have unintended effects (see the literature on 
college rankings, e.g., Bastedo & Bowman, 2009; 
McDonough, Lising, Walpole, & Perez, 1998; 
Meredith, 2004). Although accountability systems 

in general (particularly NCLB) have been studied 
extensively (e.g., de Wolf & Janssens, 2007; Dee, 
Jacob, & Schwarz, 2013), there is a need for additional 
research into the effects of summative ratings and 
rankings on student performance, school and teacher 
practices, and perceptions of parents. 


A national dialogue about appropriate local, state 
and federal accountability systems should draw 
on comparative analyses of different systems to 
identify promising improvements and remedy 
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potential flaws. The shift away from a single federally 
designed accountability system to a laboratory of 
varying rating systems among the states provides 

an opportunity to analyze the processes associated 
with constructing, maintaining, and improving 
school-level accountability systems. Combined with 
the rapid growth in education data and student- 
level longitudinal databases within states, this shift 
suggests the need for a new wave of studies about the 
relationship between educational governance and 
school performance—studies that may be able to 
show more conclusive results than those conducted 
in the nascent stages of national accountability (e.g., 
Hanushek & Raymond, 2005). 


In addition, the overlap between prominent 
consumer-oriented ratings like Greatschools.org 
and state rating systems deserves further attention, 
given the potential for public confusion caused by 
the availability of multiple ratings and the likelihood 
that some groups of users are more likely to receive 
their impressions of school performance and quality 
from some sources than from others. The role 

that various circulating ratings play in shaping the 
attitudes of different socioeconomic or racial/ethnic 
groups of parents requires further investigation. 
Such investigation would benefit those who manage 
educational systems, and those who are publicly 
committed to supporting and strengthening schools. 


Finally, in creating ratings that improve schools and 
bolster public education, both public agencies and 
private enterprises have a role to play. Gormley and 
Weimer (1999), in the conclusion of their work on 
organizational report cards, note that the public and 
private sectors each have strengths to contribute 

to the dissemination of high-quality information 

to the public. Public agencies can marshal the 
resources and will to gather and validate the data 
necessary for institutional evaluations, while 
private, media-oriented businesses have expertise in 
creating compelling presentations and promoting 
wide dissemination of results. Although their aims 
differ, and the goals of different levels of educational 
policy and governance may differ as well, each has 

a compelling interest in providing meaningful, 
actionable data that can lead to an improved quality 
of life for children and their parents. 
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